-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-blocking PUT in CHPL_COMM=ofi #25977
Conversation
ee4aa61
to
495b202
Compare
The following is the performance of the
Notes:
|
Previously, non-blocking PUTs were implemented via blocking PUTs, which could severely limit performance. Prior to 2.0, small PUTs invoked fi_inject_write, which essentially turned them into non-blocking PUTs, but chpl_comm_put returned as if the PUT was completed. This could cause MCM violations as well as hangs caused by not progressing the network stack properly. These deficiences were fixed in 2.0, but led to a performance regression. This commit implements non-blocking PUTs correctly, so that the chpl_comm_*nb* functions work correctly. This should restore 1.32.0 performance while avoiding MCM violations and hangs. Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Rewrote PUT logic so that low-level functions are non-blocking, and a blocking PUT is implemented by initiating a non-blocking PUT and waiting for it to complete. This simplifies the implementation and avoids code duplication. Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Allow specifying the maximum message size and maximum number of endpoings. These are intended primarily for testing. Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Also some code cleanup. Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
We are now using this function to force visibility when an unbound endpoint is released, so it needs to work on unbound endpoints. Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Operations to force visibility are deferred until the endpoint is released, which requires the visibility bitmaps. Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Fixed how the number of transmit contexts needed is computed, and added some comments. Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Change type of numTxCtxs and numRxCtxs to size_t to match type of info->domain_attr->ep_cnt. Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
@jhh67 : In your table above, would Chapel 2.2 be expected to essentially match the |
@bradcray: Yes. |
This PR implements non-blocking PUTs in
CHPL_COMM=ofi
correctly. Prior to Chapel 2.2, non-blocking PUTs were implemented via blocking PUTs, and blocking PUTs would be typically be "injected". The OFI injection functionality allows the source buffer to be reused immediately, and suppresses the completion event that normally indicates that the operation is complete. As a result, both blocking and non-blocking PUTs would incorrectly be considered complete when the function returned. Because injection allows the source buffer to be reused immediately this would not cause a race on the source buffer, but it would allow PUTs to be reordered and could potentially cause hangs because the upper layers would not know that there were outstanding PUTs that might require progressing the communication endpoint. Although it doesn't appear that we saw these problems in production use, the potential existed.This PR flips the situation around so that a blocking PUT is implemented as a non-blocking PUT followed by waiting for the PUT to complete. The lower-level communication primitives are now all non-blocking. PUTs may still be injected, but I will likely remove that functionality in a subsequent PR because 1) it doesn't make sense to inject a blocking PUT because you are going to wait for it to complete anyway, and 2) the semantics of non-blocking PUTs in Chapel are that the caller may not reuse the source buffer until the PUT completes. As a result, neither benefit from injection.
As part of implementing this PR I changed the dangling PUT logic. A "dangling PUT" one that may not be visible, and will need to be forced into visibility when required by the Chapel memory consistency model. Previously, PUTs were not allowed to dangle if transmit contexts were unbound, which led to a lot of special-case code to deal with bound and unbound transmit contexts. Now all dangling PUTs are treated the same whether or not the transmit contexts are bound, and dangling PUTs are forced into visibility when an unbound transmit context is freed. They could possibly be deferred until even later, but we expect unbound transmit contexts to be the exception so it doesn't make sense to optimize them further. This change simplifies the code without degrading performance.